Skip to content

fix: anchor/alternation PIKEVM routing + B5/B12 backref support#82

Merged
jbachorik merged 60 commits into
mainfrom
pr/2-anchor-alt-routing-backref
Jun 19, 2026
Merged

fix: anchor/alternation PIKEVM routing + B5/B12 backref support#82
jbachorik merged 60 commits into
mainfrom
pr/2-anchor-alt-routing-backref

Conversation

@jbachorik

@jbachorik jbachorik commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Routes anchor-diluted capturing alternations and quantified-group alternation-priority conflicts to PIKEVM_CAPTURE (the priority-correct thread engine) ahead of the DFA paths, with a compileHybrid pre-check and a revert of over-broad PikeVM promotions. Adds backref fixes: B5 (guard lazy quantifiers in VARIABLE_CAPTURE_BACKREF — throw, not silent wrong), B7 (zero-length early-accept for nullable groups), B12 (quantifier-prefix backref bytecode + isPrefixNodeHandleable for unbounded/exact prefixes).

Motivation

Correct captures/matches for pattern classes the DFA strategies mishandled; make lazy-backref gaps fail loudly instead of silently wrong.

Related Issue(s)

Stacked on PR1. Part of the 2026-06 capture-correctness & performance effort.

Improves the safety of (does not close) #33 (alternation-priority decline → fallback) and #37 (B5 lazy-quantifier guard → throws instead of producing wrong captures).

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring (no functional change)
  • Documentation
  • Test improvement
  • Build/CI change

Checklist

  • I have read the CONTRIBUTING.md guidelines
  • All existing tests pass (./gradlew build)
  • I have added tests for my changes
  • I have updated documentation (if applicable)
  • My commits are signed

Performance Impact

None (routing/correctness).

Additional Notes

Stacked on pr/1-reggieoption-fallback-substrate. Includes a large docs-only commit (dfd070a, plan files). Contains worktree-agent merge commits. Cuts at 1133d2a.

🤖 Generated with Claude Code

jbachorik and others added 30 commits June 11, 2026 17:28
Eliminates B10/B15 FallbackPatternDetector predicates and partially
eliminates B16 by routing the affected DFA_*_WITH_GROUPS patterns to
PIKEVM_CAPTURE before the DFA state-count ladder:

- B10: optional prefix before capturing group (e.g. -?(-?.{3}).)
- B15: capturing group in quantified alternation (e.g. (a|b){2,})
- B16 (partial): nullable outer quantifier on capturing group with
  non-nullable content (e.g. (a)?); patterns where both the outer
  quantifier and group content are nullable (e.g. (0*-?){0,}) still
  fall back to JDK via the new hasNullableGroupContentWithNullableQuantifier
  predicate.

Both the capture-ambiguous TDFA path and the non-ambiguous DFA-with-groups
path now have the three gates before the DFA strategy ladder. Fuzz gate:
findings=0 (9530 patterns, 76240 inputs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add PIKEVM gate inside the capturing TDFA isAnchorConditionDiluted()
block: patterns where both branches share a leading character but one
branch carries a start-anchor guard (e.g. ^x|x(y)) now route to
PIKEVM_CAPTURE instead of the JDK fallback. PikeVM evaluates ^/\A
correctly against the search-region origin since commit 0acfc66.

Patterns with optional quantifiers, nullable branches, or leading
end-anchors still fall through to the anchorConditionDiluted JDK path.
Fuzz gate confirms zero divergences with the new routing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NFABytecodeGenerator: add zero-length early-accept before bounds/regionMatches
  in generateBackreferenceCheck; groupLen==0 trivially succeeds (vacuous match)
- FallbackPatternDetector: replace broad hasNullableBackrefGroup B7 guard with
  narrowed hasAmbiguouslyNullableBackrefGroup that only falls back when the group
  body can capture strings of length > 1 (unbounded contamination risk); groups
  with max capture length <= 1 (e.g. a?, [x]?) are safe with the early-accept
- BackrefEngineGapsTest: enable b7_nullableBackrefGroupInOptimizedNfa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- RuntimeCompiler: replace CapturePolicy import with ReggieOption/UnsupportedPatternException
- Add cacheKeyFor() helper (flag-aware cache key) and fallbackOrThrow() helper
- Gate all 6 JavaRegexFallbackMatcher construction sites behind ALLOW_JDK_FALLBACK flag
- compileHybrid() receives ReggieOptions to propagate fallback policy
- UnsupportedPatternException propagates through catch(Exception) via explicit re-throw
- 34 test files updated: add allowJdkFallback() for patterns requiring JDK fallback
- New FallbackPolicyTest: throwsByDefault, delegatesWhenFallbackEnabled, nativePatternUnaffected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…otations

ReggieOption moved from reggie-runtime to reggie-annotations so the
annotation type can reference it without a circular dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik requested a review from Copilot June 19, 2026 08:23
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b9dc69e9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

jbachorik and others added 2 commits June 19, 2026 22:19
…nt==0 branch

- FallbackPatternDetector.isPrefixNodeHandleable: reject unbounded quantifiers
  (max==-1); greedy prefix loop commits without backtracking so a*(a+)\1 on
  "aa" would fail natively. Routes to fallback engine instead.
- FallbackPatternDetector.hasStringEndAnchorInAltHelper: unwrap non-capturing
  groups before the AnchorNode check so (?:\Z)|abc is treated as a pure-anchor
  branch (same as bare \Z|abc), preventing unnecessary OPTIMIZED_NFA fallback.
- PatternAnalyzer: remove dead nfa.getGroupCount()==0 branch inside the
  nfa.getGroupCount()>0 guard block; zero-group patterns handled outside this block.
- Add regression tests for the above in BackrefEngineGapsTest and
  AnchorAlternationPikeVMTest.
- StrategyCorrectnessMetaTest: clarify OPTIMIZED_NFA representative is JDK-fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 19, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

CI | build (21)   View in Datadog   GitHub Actions

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: bc24db2 | Docs | Datadog PR Page | Give us feedback!

@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a76989652b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…BACKREF

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

1 similar comment
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b617a7a1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 178413f070

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/PikeVMMatcher.java Outdated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c02d9bfa3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9fda8c64c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

All bytecode generators now handle the full Java \$/\Z terminator set:
lone \n (with CRLF guard), lone \r, \r\n pair at end-2, NEL, LS, PS.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99d6ed726e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@jbachorik jbachorik merged commit 21b9bc2 into main Jun 19, 2026
6 of 8 checks passed
@jbachorik jbachorik deleted the pr/2-anchor-alt-routing-backref branch June 19, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated or assisted by AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants